Universal Prediction over Large Alphabets

نویسندگان

  • Narayana Santhanam
  • Venkat Anantharam
چکیده

Insurance transfers losses associated with risks to the insurer for a price, the premium. Considering a natural probabilistic framework for the insurance problem, we derive a necessary and sufficient condition on loss models such that the insurer remains solvent despite the losses taken on. In particular, there need not be any upper bound on the loss—rather it is the structure of the model space that decides insurability. Insurance is a way of managing losses associated with risks—for example, floods, network outages, and earthquakes— primarily by transfering risk to another entity—the insurer, for a price, the premium. The insurer attempts to break even by balancing the possible loss that may be suffered by a few (risk) with the guaranteed payments of many (premium). In 1903, Filip Lundberg [1] defined and formulated this scenario in its natural probabilistic setting as part of his thesis. In particular, Lundberg formulated a collective risk problem pooling together the risk of all the insured. There is an underlying risk model—a probability measure on loss sequences. Typically, the model itself is unknown, but can be imagined to belong to a known class of risk models. Suppose the insurance company sets some premium to be paid by the insured regularly—say, once at the beginning of every time interval. The losses incured by the insured will be of uncertain size in every time interval, governed according to the unknown underlying risk model. For a given class of risk models, how should the premiums be set so that the insurer compensates all losses in full, yet remains solvent? Related to the insurance problem is the pricing problem that several researchers [2, 3] have considered for the Internet—these adopt, among other techniques, game theoretic principles to tackle the problem. A different approach, including that of Lundberg [1] involves studying the loss parametrically, using, for example, Poisson processes as the class of risk models. A more comprehensive theory of risk modeling has evolved [4] which incorporates several model classes for the loss other than Poisson processes, and which also includes some fat tailed distribution classes. The later approach is very reminiscent of work in probability estimation, universal compression and prediction. Lately, there has been a lot of focus on choosing model classes for new applications such as language modeling, text compression, clustering and classification. Researchers have come up with new classes of models, e.g. [5, 6], as well as theoretical and practical approaches that balance the complexity of the model classes with their description power [7]. In particular, one would like to use a model class that is as general as possible, and is yet tractable. This focus in compression literature is very pertinent to a new slew of scenarios for risk management. In settings like network outages, it is not clear what should constitute a reasonable risk model in the absence of usable information about what might cause the outages. If we are going to model these risks, how does one choose a class that is as general as possible, yet, one on which the insurer can set premiums to remain solvent? A preliminary question is, then, what are necessary and sufficient conditions for a class of measures on infinite loss sequences to be insurable? In this paper, we provide a partial answer. If losses can be modelled as i.i.d. samples from a set P of distributions we determine a necessary and sufficient condition on P for insurability. We adopt the collective risk approach, namely, we abstract the problem without loss of generality to include just two players in the insurance game—the insured and the insurer. We denote the sequence of losses by {Xi}i≥1, and we assume that Xi ∈ N for all i ≥ 1, where N denotes the set of natural numbers, {0, 1, 2, , . . . ,}. P is a collection of measures on infinite length loss sequences. In this paper, we deal with only i.i.d. measures. Consequently, we denote by P the set of distributions on N obtained as single letter marginals of P. Let N be the collection of all finite length strings of natural numbers. The insurer’s scheme Φ is a mapping from N → R, and is interpreted as the premium demanded by the insurer from the insured after a loss sequence is observed. The insurer can observe the loss for a time prior to entering the insurance game. However, we require the insurer enters the game with probability 1 no matter what loss models are in force, and the insurer cannot quit once entered. We adopt another abstraction without loss of generality: at any stage if the insurer is surprised by a loss bigger than the premium charged in that round, the insurer goes bankrupt. To see why this simplification does not involve any loss of generality, imagine the sequence of premiums set in the paper to represent the cummulative premium thus far. To eliminate trivial schemes that do not enter the game at all, we require that for all p ∈ P , the insurer enters the game with probability 1. A class P of measures is insurable if ∀ η > 0, there exists a premium scheme Φ such that ∀ p ∈ P, p(Φ goes bankrupt ) < η and if, in addition, for all p ∈ P, limn→∞ p({X : Φ(X) < ∞}) = 1. In Section 2, we consider an example each of insurable and non-insurable classes.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A lower bound on compression of unknown alphabets

Many applications call for universal compression of strings over large, possibly infinite, alphabets. However, it has long been known that the resulting redundancy is infinite even for i.i.d. distributions. It was recently shown that the redudancy of the strings’ patterns, which abstract the values of the symbols, retaining only their relative precedence, is sublinear in the blocklength n, henc...

متن کامل

Estimation and Compression Over Large Alphabets

OF THE DISSERTATION Estimation and Compression Over Large Alphabets

متن کامل

Universal Grapheme-to-Phoneme Prediction Over Latin Alphabets

We consider the problem of inducing grapheme-to-phoneme mappings for unknown languages written in a Latin alphabet. First, we collect a data-set of 107 languages with known grapheme-phoneme relationships, along with a short text in each language. We then cast our task in the framework of supervised learning, where each known language serves as a training example, and predictions are made on unk...

متن کامل

Universal Lossless Compression with Unknown Alphabets - The Average Case

Universal compression of patterns of sequences generated by independently identically distributed (i.i.d.) sources with unknown, possibly large, alphabets is investigated. A pattern is a sequence of indices that contains all consecutive indices in increasing order of first occurrence. If the alphabet of a source that generated a sequence is unknown, the inevitable cost of coding the unknown alp...

متن کامل

ar X iv : c s / 06 03 06 8 v 1 [ cs . I T ] 1 7 M ar 2 00 6 Universal Lossless Compression with Unknown Alphabets - The Average

Universal compression of patterns of sequences generated by independently identically distributed (i.i.d.) sources with unknown, possibly large, alphabets is investigated. A pattern is a sequence of indices that contains all consecutive indices in increasing order of first occurrence. If the alphabet of a source that generated a sequence is unknown, the inevitable cost of coding the unknown alp...

متن کامل

Performance of universal codes over infinite alphabets - Data Compression Conference, 2003. Proceedings. DCC 2003

It is known that universal compression of strings generated by i.i.d. sources over infinite alphabets entails infinite per-symbol redundancy. Continuing previous work [1], we consider alternative compression schemes which decompose the description of such strings into a description of the symbols appearing of the string and a description of the arrangement the symbols form. We consider two desc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011